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Abstract 

We extend the spatial A-Fleming-Viot process introduced in |BEV10| to incorpo- 
rate recombination. The process models allele frequencies in a population which is 
distributed over the two-dimensional torus T{L) of sidelength L and is subject to two 
kinds of reproduction events : small events of radius 0(1) and much rarer large events 
of radius 0{L") for some a G (0, 1]. We investigate the correlation between the times 
to the most recent common ancestor of alleles at two linked loci for a sample of size 
two from the population. These individuals are initially sampled from 'far apart' on 
the torus. As L tends to infinity, depending on the frequency of the large events, the 
recombination rate and the initial distance between the two individuals sampled, we 
obtain either a complete decorrelation of the coalescence times at the two loci, or a 
sharp transition between a first period of complete correlation and a subsequent period 
during which the remaining times needed to reach the most recent common ancestor at 
each locus are independent. We use our computations to derive approximate probabil- 
ities of identity by descent as a function of the separation at which the two individuals 
are sampled. 
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1 Introduction 



1.1 Background 

In the 30 years since its introduction, Kingman's coalescent has become a fundamental tool 
in population genetics. It provides an elegant description of the genealogical trees relating 
individuals in a sample from a highly idealised biological population, in which it is assumed 
that all individuals are selectively neutral and experience identical conditions, and that 
population size is constant. Spurred on by the flood of DNA sequence data, theoreticians 
have successfully extended the classical coalescent to incorporate more realistic biological 
assumptions such as varying population size, natural selection and genetic structure. How- 
ever, it has proved surprisingly difficult to produce satisfactory extensions for populations 
living (as many do) in continuous two-dimensional habitats - a problem dubbed the pain in 
the torus by Felsenstein (1975). 

In the classical models of population genetics, it is customary to assume that populations 
are either panmictic, meaning in particular that they have no spatial structure, or that they 
are subdivided into 'demes'. The demes sit at the vertices of a graph which is chosen to 
caricature the geographic region in which the population resides. Thus, for example, for a 
population living in a two-dimensional spatial continuum one typically takes the graph to 
be (a subset of) Z^. Reproduction takes place within demes and interaction between the 
subpopulations is through migration along the edges of the graph. Models of this type are 
collectively known as stepping stone models. 

However, in order to apply a stepping stone model to populations that are distributed 
across continuous space, one is forced to make an artificial subdivision. Moreover, the 
predictions of stepping stone models fail to match observed patterns of genetic variation. 
For example, they overestimate genetic diversity (often by many orders of magnitude) and 
they fail to predict the long-range correlations in allele frequencies seen in real populations. 

In recent work ( |Eth08[ IBEVlOl IBKElOj ) we introduced a new framework in which to 
model populations evolving in a spatial continuum. The key idea, which enables us to 
overcome the pain in the torus, is that reproduction is driven by a Poisson process of events 
which are based on geographical space rather than on individuals. This leads, in particular, 
to a class of models that could reasonably be called continuum stepping stone models, but 
it also allows one to incorporate large-scale extinction/recolonisation events. Such events 
dominate the demographic history of many species. They appear in our framework as 'local 
population bottlenecks'. In [BKElOj . we show (numerically) how the inclusion of such events 
can lead to long-range correlations in allele frequencies. In [BEVlOj a rigorous mathematical 
analysis of a class of models on a torus in illustrates the reduction in genetic diversity 
that can result from such large-scale demographic events. We expand further on this in ^ 
Thus large-scale events provide one plausible explanation of the two deficiencies of stepping 
stone models highlighted above, but of course they are not the only possible explanation. 

A natural question now arises:how could we infer the existence of these large-scale events 
from data? One possible answer is through correlations in patterns of variation at different 
genetic loci. Recall that in a diploid population (in which chromosomes are carried in 
pairs) correlations between linked genes (that is genes occurring on the same chromosome) 
are broken down over time by recombination (which results in two genes on the same 
chromosome being inherited from different chromosomes in the parent). We say that genes 
are loosely linked, if the rate of recombination events is high (for example if the chance of 
a recombination in a single generation is 0(1)). In the Kingman coalescent, genealogies 
relating loosely linked genes evolve independently. This is because on the timescale of the 
coalescent, the states in which lineages ancestral to both loci are in the same individual 
vanish instantaneously. It is well known that if a population experiences a bottleneck, this 



2 



is no longer the case. As we trace backwards in time, when we reach the bottleneck, we 
expect to see a significant proportion of surviving lineages coalesce at the same time and 
so we see correlations in genealogies even at unlinked loci. With local bottlenecks we can 
expect a rather more complicated picture. The degree of correlations across loci will depend 
upon the spatial separation of individuals in the sample. 

The purpose of this paper is to extend the model of [BEVlOj to diploid populations, 
to incorporate recombination, and to provide a first rigorous analysis of the correlations in 
genealogies at different loci in the presence of local extinction/recolonisation events. Since 
the questions we shall address and some of the methods we shall use here are related to 
those of |BEV10| . the reader may find it useful to have some familiarity with the results of 
that paper. 

1.2 The model 

In |BEV10j . we introduced the spatial A-Fleming-Viot process as a model of a haploid 
population evolving in a spatial continuum. It is a Markov process taking its values in 
the set of functions which associate to each point of the geographical space a probability 
measure on a compact space, K, of genetic types. If $ is the current state of the population 
and X is a spatial location, the measure can be interpreted as the distribution of the 
type of an individual sampled from location x. The dynamics are driven by a Poisson point 
process of events. An event specifies a spatial region, A say, and a number u G (0, 1]. As 
a result of the event, a proportion u of individuals within A are replaced by offspring of a 
parent sampled from a point picked uniformly at random from A. In |BEV10| the regions 
A are chosen to be discs of random radius (whose centres fall with intensity proportional to 
Lebesgue measure) and the distribution of u can depend on the radius of the disc. Under 
appropriate conditions, existence and uniqueness in law of the process were established. 

Here we wish to extend this framework in a number of directions. First, whereas in 
[BEVIO] a single parent was chosen from the region A, here we allow A to be repopulated 
by the offspring of a finite (random) number of its inhabitants. Second, we assume that 
the population is diploid. We shall follow (neutral) genes at two distinct (linked) loci, 
with recombination acting between them. Writing Ki and K2 for the possible types at the 
two loci, the type of an individual is an element of Ki x K2 (which we can identify with 
[0, 1] X [0, 1]). As in |BEV10| . we work on the torus T(L) of side L in M? and we suppose 
that there are two types of event:small events, affecting regions of radius 0(1), which 
might be thought of as 'ordinary' reproduction events; and 'large' events, representing 
extinction/recolonisation events, affecting regions of radius 0(L") where a € (0,1] is a 
fixed parameter. In order to keep the notation as simple as possible, we shall only allow two 
different radii for our events, Rs corresponding to 'small' reproduction events and RbL" 
corresponding to 'large' local bottlenecks. We shall also suppose that the corresponding 
proportions Ug and ub are fixed. Neither of these assumptions is essential to the results, 
which would carry over to the more general setting in which each oi Rs, Rs, Ug and is 
sampled (independently) from given distributions each time an event occurs. 

Let us specify the dynamics of the process more precisely. Let 

• Rs, Rb G (0, 00), Us,ub G (0, 1) and a G (0, 1]; 

• Xs,^B be two distributions on N = {1,2,...} with bounded support and such that 
A.({1}) < 1; 

• (/9L)LgN be an increasing sequence such that > logL for all L G N, and L~'^°'pL 
tends to a finite limit (possibly zero) as L — )■ 00; and 
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• (ri)igN be a non-increasing sequence with values in (0, 1]. 

For L G N, we denote by Hg a Poisson point process on R x T(L) with intensity measure 
dt dx, and by 11^ another Poisson point process on M x T(L), independent of H^, with 
intensity measure {pLL'^°^)~^dt®dx. The spatial A-Fleming-Viot process '^^ on T(L) evolves 
as follows. 

Small events: If (f, x) is a point of Ilf , a reproduction event takes place at time t within 
the closed ball B{x, Rs). 

• A number j is sampled according to the measure A^; 

• j sites, selected uniformly at random from B(x,Rs)', and, 

• for each i = 1, . . . ,j, a type (oj, bi) is sampled according to ^^^{zi). 

If j > 1, then for all y G B{x, Rs), 

^Hy) ■= (1 - us)^Uy) + ^^^^^ E^(".A) + E ^^n^)- 

i=l ^ ii^i2 

If j = 1, for each y G -B(x, i?s), 

^^y) := (1 - + Us6^,,^h^)- 

In both cases, sites outside B{x,Rs) are not affected. 

Large events: If (t,x) is a point of 11^, an extinction/recolonisation event takes place at 
time t within the closed ball B{x, L°'Rb)- 

• A number j is sampled according to the measure A^; 

• j sites, zi, . . . , Zj are selected uniformly at random from B{x, L^Rb)', and 

• for each i = 1, . . . , j, a type {ai,bi) is sampled according to ^^_{zi). 

For each y G B{x, L°'Rb), 

k=l 

Again, sites outside the ball are not affected. 

Remark 1.1. 1. The scheme of choosing j parental locations and then sampling a parental 
type at each of those locations is convenient when one is interested in tracing lineages 
ancestral to a sample from the population. R can be thought of as sampling j indi- 
viduals, uniformly at random from the ball affected by the reproduction (or extinc- 
tion/recolonisation) event, to reproduce. Of course this scheme allows for the possi- 
bility of more than one parent contributing offspring so that we should more correctly 
call this model a spatial E-Fleming-Viot process, but to emphasize the close link with 
previous work we shall abuse terminology and use the name A-Fleming- Viot process. 

2. The recombination scheme mirrors that generally employed in Moran models. The 
quantity r^ is the proportion of offspring who, as a result of recombination, inherit 
the types at the two loci from different parental chromosomes. We have chosen to 
sample the types of those two chromosomes from different points in space. The result 
of this is that provided the individuals sampled from the current population are in 
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distinct geographic locations, if two ancestral lineages are at spatial distance zero, then 
they are necessarily in the same individual. This is mathematically convenient (c.f. 
Remark \3.S^) . hut, arguably, not terribly natural biologically. However, changing the 
sampling scheme, for example so that the two recombining chromosomes are sampled 
from the same location, would not materially change our results. 

3. We are assuming that recolonisation is so rapid after an extinction event that the 
effects of recombination during recolonisation are negligible. 

4. Since T{L) is compact, the overall rate at which events fall is finite for any L and 
the corresponding spatial A-Fleming-Viot process with recombination is well-defined. 
Notice that a given site, x say, is affected by a small event at rate ttR'^ = 0{1) (since 
the centre of the event must fall within a distance Rg of x ), whereas it is hit by a large 
event at rate TrR^pJ^^ = 0{p'£^). So reproduction events are frequent, but massive 
extinction/recolonisation events are rare. 

1.3 Genealogical relationships 

Having established the (forwards in time) dynamics of allele frequencies in our model, 
we now turn to the genealogical relationships between individuals in a sample from the 
population. 

First suppose that we are tracing the lineage ancestral to a single locus on a chromosome 
carried by just one individual in the current population. Recombination does not affect us 
and we see that the lineage will move in a series of jumps:if its current location is z, then 
it will jump to z + X (resp. z + L'^x) due to a small (resp. large) event with respective 
intensities 

(irr LRg{Q,x) dx 

Lr^{0,x)Us and '^B—f^, (1) 

ttK^ PL vr/i^ 

where Lr^x, y) denotes the volume of the intersection B{x, R) r]B{y, R) (viewed as a subset 
of T(L) for the first intensity measure, and of T(L^~") for the second). To see this, note 
first that by translation invariance of the model we may suppose that z = 0. In order for 
the lineage to experience a small jump, say, from the origin to x, the origin and the position 
X must be covered by the same event. This means that the centre of the event must lie in 
both B{0,Rs) and B{x,Rs). The rate at which such events occur is Lr^(0,x). The lineage 
will only jump if it is sampled from the portion Ug of the population that are offspring of the 
event and then it will jump to the position of its parent, which is uniformly distributed on 
a ball of area ttR^. Combining these observations gives the first intensity in ([T|. A lineage 
ancestral to a single locus in a single individual thus follows a compound Poisson process 
on T(L). 

Suppose now that we sample a single individual, but trace back its ancestry at both loci. 
We start with a single lineage which moves, as above, in a series of jumps as long as it is 
in the fraction Us{l — r^) of 'non-recombinants' in the population. However, every time it 
is hit by a small event, there is a probability Ugri that it was created by recombination 
from two parental chromosomes, whose locations are sampled uniformly at random from 
the region affected by the event. If this happens, we must follow two distinct lineages, one 
for each locus, which jump around T(L) in an a priori correlated manner (since they may 
be hit by the same events), until they coalesce again. This will happen if they are both 
affected by an event (small or large) and are both derived from the same parent (which for 
a given event has probability 1/j in our notation above). 

Thus, the ancestry of the two loci from our sampled individual is encoded in a system 
of splitting and coalescing lineages. If we now sample two individuals, {A,B) and (a, 6), 
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we represent their genealogical relations at the two loci by a process A taking values in 
the set of partitions of {A, a, B, b} whose blocks are labeled by an element of T{L). As in 
[BEVlOj . at time t > each block of Ai contains the labels of all the lineages having a 
common ancestor (i.e., carried by the same individual) t units of time in the past, and the 
mark of the block records the spatial location of this ancestor. The only difference with the 
ancestral process defined in jBEVlOj is that blocks can now split due to a recombination 
event. 

Of course, if r^;, is small, then the periods of time when the lineages are in a single 
individual, that is during which they have coalesced and not split apart again, can be 
rather extensive. This has the potential to create strong correlations between the two loci. 
The other source of correlation is the large events which can cause coalescences between 
lineages even when they are geographically far apart. To gain an understanding of these 
correlations, we ask the following question: 

The problem: Given a, pi and ri, is there a minimal distance such that, asymptoti- 
cally as L — )• oo, 

• if we sample two individuals {A, B) and (a, h) at distance at least D*^ from each other, 
then the coalescence time of the ancestral lineages of A and a is independent of that 
of the ancestral lineages of B and h (in other words, genealogies at the two loci are 
completely decorrelated) ; 

• if two individuals are sampled at a distance less than D*^ , then the genealogies at the 
two loci are correlated (i.e., the lineages ancestral to A and B, and similarly those of 
a and 6, remain sufficiently 'close together' for a sufficiently long time that there is a 
significant chance that the coalescence of A and a implies that of B and h at the same 
time or soon after)? 

1.4 Main results 

Before stating our main results, we introduce some notation. We shall always denote the 
types of the two individuals in our sample by {A,B) and {a,b). The same letters will be 
used to distinguish the corresponding ancestral lineages. As we briefly mentioned in the last 
section, the genealogical relationships between the two loci at time t > before the present 
are represented by a marked partition of {A, a, B, b}, in which each block corresponds to an 
individual in the ancestral population at time t who carries lineages ancestral to our sample. 
The labels in the block are those of the corresponding lineages and the mark is the spatial 
location of the ancestor. For any such marked partition ul, we write Pq^ for the probability 
measure under which the genealogical process starts from ol, with the understanding that 
marks then evolve on the torus T(L). Typically, our initial configuration will be of the form 

aL ■.= {{{A,B},xl),{{a,b},^i)}, 

where the separation xl := x^ — x\ between the two sampled individuals will be assumed 
to be large. The coalescence times of the ancestral lineages at each locus are denoted by 
r^^ and r^^,. Finally, we write |x| for the Euclidean norm of x E (or in a torus of any 
size) and o"^ > is a constant, whose value is given just after d?]). (It corresponds, after a 
suitable space-time rescaling, to the limit as L — )• oo of the variance of the displacement of 
a lineage during a time interval of length one.) 

For later comparison, we first record the asymptotic behaviour of the coalescence time 
at a single locus. The proof of the following result is in ^ 
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Proposition 1.2. Suppose that for each L € N the two individuals comprising our initial 
configuration are at separation xl G T(L). Suppose also that '"^g^ — )■ /? G (a, 1] as 
L — )• oo. (In particular, a < 1 here.) Then, 
a) For all t £ [f3, 1], 

(3-a 



lim P, 

L—^oo 



tL > PL 



j2(t-a) 



t 



a 



b) For all t > 0, 



lim P, 



rAa> 



a 



27ro-2 



13 — a 



1 



a 



Remark 1.3. Observe that the timescale considered in case b) above coincides with the 
quantity wl defined in Theorem 3.3 of IBEVIC^ . Indeed, using the notation of JBEVIO^ . 
the variance is given by the following limit : 

PL 



a 



lim ^„ 



where and are defined in Equation (20) of WEVIO^ . Now, if piL 
a) of Theorem 3.3, we obtain 

1 



as in case 



a 



while if plL 
1 



a 



27ro-2 



27rcj2 
1/6 > 0, we have 



1 



a 



p^L2(i-")logL, 



B 



1 



a 



27r(icT2 + 4) ^ 



loffL 



a 



27r{al+bal) 



loffL. 



In both cases, the timescale considered in Proposition 1.2 is the same as the quantity -udl of 
Theorem 3.3 of WE VI 01 . 

In the case a = 0, Proposition 11.21 precisely matches corresponding results of |CG86] 
and [ZCD05] for coalescing random walks on a torus in T?. For a > 0, we see that if 
lineages start at a separation of 0{L^), with /3 > a, then the small events don't affect the 
asymptotic coalescence times; they are the same as those for a random walk with bounded 
jumps on T{L^~°') started at separation 0{L^^^~°'')). In particular, the first statement tells 
us that the chance that coalescence occurs at a time ^ p^L^^^""^ logL is (1 — /3)/(l — a). 
If this does not happen, then since the time taken for the random walks to reach their 
equilibrium distribution is 0{plL'^^^~°'^ logL), in these units, the additional time that we 
must wait to see a coalescence is asymptotically exponential. 

When we consider the genealogies at two loci, several regimes appear depending on the 
recombination rate and the initial distance between the individuals sampled. 

Theorem 1.4. Suppose {aL)L&n is as in Proposition \1.2l If 



lim sup 



log (1 + 



log PL 
r-LpL 



2 logL 



</3 



a. 



(2) 



then we have: 

a) For all t £ [f3, 1], 



b) For all t > 0, 



lim P, 

L—^oo 



^L . L . ^ r2(t-a) 
^Aa ^ T^Bb > PlE ^ > 



(/3 



a] 



{t - ay 



lim ] 

L—^oo 



TAa^TBb > 



a 



27r(72 



PlL2(i-") logLt 



i/3 



a 



-2t 



(1 



a 
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Under the conditions of Theorem 11.41 the individuals are initially sampled at a distance 
much larger than the radius of the large events, and recombination is fast enough for the 
coalescence times at the two loci to be asymptotically independent (see Remark ll.7p . For 
slower recombination rates this is no longer the case. When Condition ([2|) is not satisfied, 
we have instead: 

Theorem 1.5. Suppose (ai)^^^ is as in Proposition Assume there exists 7 E 1) 
such that 



lim 



log (1 + 



L-s>oo 2 log L 



7 — 0:. 



(3) 



Then, 

a) For all t G [/3,7], 



lim P, 

L— >oo 



(5 — a 
t — 



h) For all t G (7, 1], 



lim 

L— >oo 



'l ^ ^ r2(t-Q) 



{P-a){^-af 
(7 — a){t — aY 



c) For all t > 0, 



lim ] 



TAa ^ Tm > 



27ro-2 



PiL^d-") logLt 



(/3-a)(7 



a 



-2t 



(7 — a)(l — ay 



This time, we observe a 'phase transition' at time plLP'^'^~°'\ Asymptotically, coales- 
cence times are completely correlated for times of 0{plL'^^'^~"^), but conditional on being 
greater than this 'decorrelation threshold' they are independent. To understand this thresh- 
old, recall from Proposition 11.21 that, initially, coalescence of lineages ancestral to a single 
locus happens on the exponential timescale /9lL^^*~") ,t € [/3, 1] and is driven by large events. 
This tells us that the effect of recombination will be felt only if exactly one of the lineages 
ancestral to A and B (or to a and b) is 'hit' by a large event. Since recombination events 
between A and B result in only a small separation of the corresponding ancestral lineages, 
we can expect that many of them will rapidly be followed by coalescence of the correspond- 
ing lineages (due to small events). This leads us to the idea of an 'effective' recombination 
event, which is one following which at least one of the lineages ancestral to A and B is 
affected by a large event before they coalesce due to small events. We shall see in Proposi- 
tion ST] that recombination is 'effective' on the linear timescale /Ol(1 + {log pi) / (rLpi)) t, 
t > 0. Under condition ([3]) the timescales of coalescence and effective recombination cross 
over precisely at time plLP'^^~^\ 

Two cases remain: 

The case a</3<l,7>l: If 7 > 1, the arguments of the proof of Theorem 11.51 show 
that the recombination is too slow to be effective on the timescale of coalescence and 
so the coalescence times at the two loci are completely correlated and are given by 
Proposition II. 2[ For 7 = 1, the result depends on the precise form of (log pl) / {tlPl)- 
If it remains close enough to L^(^~") (or smaller), the proof of Theorem 11.51 shows 
that lineages are completely correlated on the timescale plL'^^^~°'\ t < 1, and then, 
conditional on not having coalesced before plL'^^^~°'\ they evolve independently on 
the timescale plL"^^^^"^ log L t. On the other hand, if {logpL)/{rLPLL^^^~"^) grows 
to infinity sufficiently fast, then, just as in the case 7 > 1, recombination is too slow 
to be effective. 
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The case f3 < a < 1: If we drop our assumption that the separation of the individuals in 
our sample is much greater than the radius of the largest events, then we can no longer 
make such precise statements. Proposition 6.4(a) in |BEV10j (with ipi = L") shows 
that the coalescence time for lineages ancestral to a single locus will now be at most 
0{pl). This does tell us that if r^/OL — )• as L — )• oo, then asymptotically we will not 
see any recombination before coalescence and the coalescence times r^^ and r^^ are 
identical. However, in contrast to the setting of Proposition 11.21 even asymptotically, 
their common value depends on the exact separation of the individuals sampled. The 
same reasoning is valid when <^ {log pi)/ p^. In this case, although we may see 
some recombination events before any coalescence occurs, a closer look at the proof of 
Proposition 14.11 reveals that the time spent in distinct individuals by the two lineages 
ancestral to A,B, say, in 0{pl) units of time is negligible compared to pi. Thus, 
with high probability, any large event affecting lineages ancestral to our sample will 
occur at a time when the lineages ancestral to A and B are in the same individual, 
(as are those ancestral to a and b). As a result, once again r^^ = r^^ with probability 
tending to 1. 

On the other hand, suppose remains large enough that lineages ancestral to A and 
B have a chance to be hit by a large event while they are in different individuals 
and thus jump to a separation 0{L") (the effective recombination of ^4.ip . We are 
still unable to recover precise results. The reason is that even after such an event, 
we may be in a situation in which all lineages could be hit by the same large event, 
or at least remain at separations 0{L°'). But we shall see that a key to the proofs 
of Theorems 11.41 and 11.51 is the fact that, in the settings considered there, where 
individuals are sampled from far apart, whenever two lineages come to within 2RbL°^ 
of one another, the other ancestral lineages are still very far from them. This gives 
the pair time to merge without 'interference' from the other lineages. Since lineages 
at separations 0(L") are correlated and their coalescence times depend strongly on 
their precise (geographical) paths on this scale, it is difficult to quantify the extent 
to which the fact that the ancestral lines of A, B and of a, b start within the same 
individuals makes the coalescence times r^^ and r^^ more correlated. Nonetheless, 
this is an important question and will be addressed elsewhere. 

To answer our initial question, we see from Theorems 11.41 and 11.51 (and the subsequent 
discussion) that D*^ is informally given by 



log(l + ^)«2(logD2-alogL), i.e. ^2 « 1 + 

When the sampling distance is greater than the radius of the largest events, correlated 
genealogies are only possible when recombination is slow enough, or large events occur 
rarely enough, that (log pL)/{rLpL) » 1. If for instance r^ = r > 0, the two loci are 
always asymptotically decorrelated. On the other hand, if 7 is as in ([3]) (note that 7 does 
not need to exist for Condition ([2|) to hold) and the sampling distance is L^, Theorem 11.41 
shows that if /3 > 7 the genealogies at the two loci are asymptotically independent, whereas 
Theorem 11.51 tells us that if /3 G (0,7), there is a first phase of complete correlation. Thus, 

Before closing this section, let us make two remarks: 

Remark 1.6. (Bounds on the rates of large events.) Recall that we imposed the 
condition logL < < CL?"^. The reason for the upper bound is that in WEVlO^ . we 
showed that the coalescence of the ancestral lineages is then driven by the large events and, 
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moreover, is very rapid once lineages are at separation 0{L'^) (see the proof of Theorem 3.3 
in JBEVIO^ }. Similar results should hold, although on different timescales, in the other 
cases presented in WEVIC^ . However, to keep the presentation of our results as simple as 
possible, we have chosen to concentrate on this upper bound. The (rather undemanding) 
lower bound is needed in the proof of Proposition \4-4\ 

Remark 1.7. (Generalisation of Theorems \1.4\ and\T7^ to distinct coalescence 
times.) In these two theorems, we could also consider the probabilities of events of the 
form {r^Q > plL"^^^'"^ and r^j, > plL"^^^ ~°'^}, with t < t' . However, they can be computed 
by a simple application of Theorem \1.4\ or \1.5\ at time t, and the Markov property. Indeed, 
arguments similar to those of the proofs of Lemma C and Lemma \3.7\ in ^3 tell us that 
the distance between lineages ancestral to B and b at time plL'^^^~°'\ conditional on not 
having coalesced by this time, lies in [L*/(logL),L*logL]. Proposition then enables us 
to conclude. We leave this generalisation to the reader. 

The rest of the paper is laid out as follows. In ^we provide more detail of the motivation 
for the question addressed here. In ^ we prove Proposition 11.21 and collect several results 
on genealogies of a sample from a single locus that we shall need in the sequel. Since most of 
these results are close to those established in |BEV10j , or require techniques used in |CG86j 
and |ZCD05j for similar questions on the discrete torus, their proofs will only be sketched. 
Our main results are proved in ^we define an effective recombination rate in ^4.11 use it 
to find an upper bound on the time we must wait before the two lineages ancestral to A 
and B start to evolve independently in §4.21 and finally derive the asymptotic coalescence 
times of our two pairs of lineages in ^4.3[ 

2 Biological motivation 

In this section we expand on the biological motivation for our work. 

It has long been understood that for many models of spatially distributed populations, 
if individuals are sampled sufficiently far from one another, then the genealogical tree that 
records the relationships between the alleles carried by those individuals at a single locus 
is well-approximated by a Kingman coalescent with an 'effective population size' capturing 
the influence of the geographical structure. If the underlying population model is a stepping 
stone model, with the population residing in discrete demes located at the vertices of 7? 
or T{L) n Z^, individuals reproducing within demes and migration modelled as a random 
walk, then the genealogical trees relating individuals in a finite sample from the population 
are traced out by a system of coalescing random walks. The case in which random walks 
coalesce instantly on meeting corresponds (loosely) to a single individual living in each 
deme in which case the stepping stone model reduces to the voter model. In this setting, 
and with symmetric nearest neighbour migration, convergence to the Kingman coalescent 
as the separation of individuals in the initial sample tends to infinity was established for 
Z2 in ICG861 ICG9nj . and for T(L) n in |Cox89j . In \CB02\ IZCDOSj . Zahle, Cox and 
Durrett prove the same kind of convergence for coalescing random walks on T(L) n with 
finite variance jumps and delayed coalescence (describing the genealogy for a sample from 
Kimura's stepping stone model on the discrete torus in which reproduction within each 
deme is modelled by a Wright-Fisher diffusion). In jLSOGj . Limic and Sturm prove the 
analogous result when mergers between random walks within a deme are not necessarily 
pairwise. In the same spirit but on the continuous space T(L) and with additional large 
extinction/recolonisation events (similar to those described in §1.2p . the same asymptotic 
behaviour is obtained in [BEVlOj for the systems of coalescing compound Poisson processes 
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describing the genealogy of a sample from the spatial A-Fleming-Viot process, under suitable 
conditions on the frequency and extent of the large events. 

In all of these examples, the result stems from a separation of timescales. For example, 
in |BEV10j we were concerned with the genealogy of a sample picked uniformly at random 
from the whole torus. Under this assumption, the time that two lineages need to be 'gath- 
ered' close enough together that they can both be affected by the same event dominates 
the additional time the lineages take to coalesce, having being gathered. As explained in 
^1.4| this decomposition does not hold when lineages start too close together, and so the 
tools developed for well-separated samples are of no use in the study of local correlations. 
However, although we still cannot make precise statements about the genealogy of samples 
which are initially too close together, the work of §4.11 and §4.21 which are concerned with 
'effective recombination' and 'decor relation', provides a much better understanding than we 
had before of the local mechanisms that create correlations between nearby lineages, how 
strong these correlations are, and how to 'escape' them. 

Our main results in this paper are concerned with samples taken at 'intermediate' scales. 
Individuals are sampled at pairwise distances much larger than the radius of the largest 
events, but these distances can still be much less than the radius of the torus. In this 
case, the 'gathering time' of two lineages starting at separation xl depends on that sepa- 
ration, but asymptotically this dependence is only through log\xL\/logL. As in the case 
of a uniform sample, the gathering time dominates the additional time to coalescence. In 
Theorem 3.3 of jBEVlOj we showed that if we sample a finite number of individuals uni- 
formly at random from the geographic range of a population which is subject to small and 
large demographic events, then measuring time in units of size wl = ^^{pl/ L'^°^)L'^ logL 
(under the assumption on used here), their genealogical tree is determined by King- 
man's coalescent. In particular, if p^ < (i-e., large events are not too rare), one major 
effect of the presence of large extinction/recolonisation events is to reduce the effective 
population size and, consequently, genetic diversity. The assumption of uniform sampling 
guarantees that initially ancestral lineages are 0{L/logL) apart. Proposition 11.21 extends 
the result by showing that, if we sample our individuals from much closer together, then 
we should consider two timescales. The first is (/3l/L^°)L^*, t £ [(3, 1]. The second kicks in 
after 0{plL'^^^~°'^), when the lineages start to feel the fact that space is limited and their 
ancestries evolve on the linear timescale tuit. Now, by the same reasoning, if there were 
no large events these timescales would be, respectively, L^*, t E [/3, 1], and L"^ log Lt, 
t > 0. Of course one never observes genealogies directly and so, for illustration, we intro- 
duce (infinitely many alleles) mutation into our model and compute the probability that 
two individuals sampled at a given separation are identical by descent (IBD) as a function 
of the exponent /3. In other words, what is the probability that the two individuals carry 
the same type (at a given locus) because it was inherited from a common ancestor. 

Since mutations are generally assumed to occur at a linear rate, whilst the first phase 
of the genealogical tree develops on a much slower exponential timescale, for a given time 
parameter t G [/3, 1], asymptotically as L — t- 00, we would see either zero or infinitely 
many mutations on the tree. However, let us suppose that L is large and write 9 for the 
mutation rate at locus A. We denote by the ratio p^/L^". Since IBD is equivalent to 
our individuals experiencing no mutation between the time of their most recent common 
ancestor and the present, the probability of IBD of two individuals sampled at distance 
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Figure 1: Probability of IBD at a single locus, as a function of /?. Here, L = 10^, a = 0.1, 
cl = 0.01 and 6 = 10"'^. The solid line corresponds to the case with small and large events, 
the dash-dot line to the case with only small events. Geographical correlations vanish 
around (3 = 0.32 without large events, and are positive up to /? = 0.52 when large events 
occur. 



is given by 



(/?-«) 



+ E 



/ e-^^'P^, [rl edt]+ e-^^'P^, [r^ G dt] 



1 g-2eciL2^ 

13 (u-a)2 



du + 



/3-Q 



1 — a 



^-2ecLLHogLu^-u^^^ (4) 



1/logL 



where the last line uses a change of variable and the results of Proposition 11.21 The corre- 
sponding quantity when there are no large events is given by 



/3 



1 g-20L2" 



/3 



du + 13 I e 

'l/logL 



The leading term in each sum is the first one, and we thus see that if c_l <C 1 (i.e., <C -^^^°), 
then, as expected, the probability of IBD is higher in the presence of large events and, 
moreover, as a consequence of shorter genealogies, correlations between gene frequencies 
persist over longer spatial scales. See Figure [1] for an illustration (in which only the leading 
terms are plotted). In classical models IBD decays approximately exponentially with the 
sampling distance, at least over small scales. In [BKElOj . a numerical investigation of a 
similar model to that presented here revealed approximately exponential decay over small 
scales followed by a transistion to a different exponential rate over somewhat larger scales. 
Since the (rigorous) results of Proposition 11.21 only apply for sufficiently well separated 
samples, our arguments above cannot capture this. They do, on the other hand, give a 
clear indication of the reduction of effective population size due to large events. 

Local bottelnecks are not the only explanations for a reduced effective population size. 
For example, selection or fluctuating population sizes can have the same effect, and so we 
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should like to find a more 'personal' signature of the presence of demographic events of 
different orders of magnitude. The idea that we explore here is to consider several loci on 
the same chromosome, subject to recombination, and to investigate the pattern of linkage 
disequilibrium obtained under the assumptions of ^1.4[ Using the results of Theorems 11.41 
and ll.5[ we have 



F^p [IBD at both loci] = E^^s 



{clL2/5<7 



,<clL2-'} 



X E 



L/9 



Bb 1 



where 9i and 62 denote the mutation rates at each locus and the first integral is if 
Condition ^ holds (i.e., if there is no first period of complete correlation). By the same 
computations as in the leading terms in this expression are 



(/3-a) 



7 g-2(6»i+e2)cz,L2 

'/3 {u - 



du + {(3 



a) 



-1 g- 201 CiL2" 



(u — ay 




1 g-2e2CLL2« 



(u — a)2 



■ du 



(5) 



On the other hand, when there are no large events, the analysis of Lemma [4.3l (with effective 
recombination replaced by recombination and the separation to attain of the order of L) 
tells us that the time two lineages initially in the same individual need to 'decorrelate' is of 



Here rr ^ is the expected time to wait until we see a recombination 



the order of r~[^ log L. 

event, and logL is (roughly) the mean number of recombination events before we see one 
after which the lineages remain separated for a duration 0{L^) for some t S [/3, 1]. Hence, 
when there are only small events, the leading terms in the probability of IBD at both loci 
are 

nl p-2(0i+6l2)L2" 



/3 



du + 




1 g-26»2L2" 



■ du 



where we have set 7^ := log(r^^ log L) /(2 log L) and the first integral is again zero if /3 > 7^. 
Figure [2] compares the different curves obtained when (i) we always have decorrelation 
(7 < a), (ii) we always have complete correlation (7 > 1), or (iii) when we have a transition 
between these two regimes (7 € (a, 1)). As expected, we see that the probability of IBD at 
both loci is higher in the presence of large events (when pL < L'^"')-, and there is correlation 
between the two loci when individuals are sampled over large spatial distances. Furthermore, 
([5|) gives us an idea of how the correlations between the two loci decay with sampling 
distance, as this grows from the radius of the large events to the whole population range. 
Correlations for sampling distances smaller than or equal to the size of the large events will 
be the object of future work. 



3 Genealogies at one locus 

In this section we prove Proposition ll.2[ In the process we introduce a rescaling of the spatial 
motion of our ancestral lineages and collect together several results on the time required 
to 'gather' two lineages to within distance 2RbL°' which will also be needed in Section HI 
Since the techniques mirror closely those used in previous work, in the interests of brevity, 
we restrict ourselves to sketching the proofs and providing references where appropriate. 
Assume for the rest of this section that a < 1. 

The following local central limit theorem, corresponding to Lemma 5.4 of [BEVlOj . is 
the key to understanding the behaviour of two lineages. Suppose that for each L £ N, is 
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Figure 2: Probability of IBD at both loci, as a function of /3. As in Figure [U L = 10^, 
a = 0.1, cl = 0.01 and 9i = 62 = 10~^. The solid line corresponds to the case 7 > 1 
(complete correlation for any /3), the dotted line to the case 7 < a (decorrelation for any 
/3), and the dashed line to the intermediate case 7 = 0.4. The dash-dot line corresponds to 
the case without large events, for which 7^ is computed from the same parameter values 
(i.e., 72 = 0.2). 

a Levy process on T(L) such that — i^{0) has a covariance matrix of the form crfld, 

and that 



(i) there exists > such that a\ 



a as L — )■ 00; 



(^mJ Eq [1^^(1)1^] is bounded uniformly in L. 

We shall implicitly suppose that all processes £^ are defined on the same probability 
space, and that under the probability measure P^; the Levy process we consider starts at 
X. Let {dL)L>i be a sequence of positive reals such that liminfi_^oo > and '"^^^^^^ — 
r] S [0, 1). Finally, let us write p^{x,t) for Px[(^^{t) E B[0,dL)] and [z\ for the integer part 
of z G M. 



Lemma A [5.4 in |BEV10| ] 

a) Let El '■= (logL)"^/^. There exists a constant Ci < 00 such that for every L >2, 



sup sup 



4 



b) If vl ^ 00 as L ^ 00, then 



hm sup sup 



p^{x,t) 



L2 



0. 



c) If ul ^ 00 as 1/ —)• 00 and I{dL,x) := 1 + (jxp V then 



lim sup 



sup 



2alt 



xeT(L) ULlidL,x)<t<eLL^ 



p^{x,t) 



4 



2alt 



0. 
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d) There exists a constant C2 < oo such that for every L>1, 



\x 



sup sup ['^ + —7r]P {x,t)<C2- 

t>0 x&T{L) \ "l / 

What Lemma A shows is that, for times which are large but of order at most ©(L^), 
£^ behaves hke two-dimensional Brownian motion (case c), and in particular it has not yet 
explored the torus enough to 'see' that space is limited. On the other hand, i^{t) is nearly 
uniformly distributed over T(L) at any time much greater than (case b). 

Fix R > 0. As a direct corollary of this local central limit theorem, we proved in 
Lemma 5.5 of jBEV10| that, if T{R,£^) denotes the entrance time of into the ball 
B{0,R), then the following inequality holds. 

Lemma B [5.5 in [BEVIO] ] Let {Ul)l>\ and (ul)l>i be two sequences increasing to 
infinity such that UlL~^ — )• 00 as L — >• 00 and 2ul < L^(logL)~^/^ for every L > 1. Then, 
there exist Co > and Lq G N such that for every sequence (C^^)l>i satisfying U'j^>Ul 
for each L, every L > Lq and all x € Tr(L), 

P^[TiR,i^)G[U',-ULM]<^. 

Lemma B tells us about the regime in which has already homogenised over T(L). 
Using exactly the same method, but employing parts c) and d) of Lemma A rather than 6), 
we obtain the analogous result for the regime in which behaves as Brownian motion on 

Lemma 3.1. If Ul < L^(logL)~^/^ for each L > 1, Ul,ul 00 and ul/Ul — )■ as 
L —7- 00, then there exist Ci > and Li € N such that for every sequence (C/^)l>i satisfying 
Ul<U'^< L2(logL)-V2 for each L, for every L > Li and x £ T{L), 

P,[TiR,e^)G[U'L-ULM]<^. 

Let us now introduce the processes to which we wish to apply these results. For each 
L G N, let {X^a(t), t > 0} be the process recording the difference between the locations 
on T(L) of the ancestral lineages of A and a (that is, the first locus of each of the two 
individuals sampled). The process X^^ is the difference between two dependent compound 
Poisson processes. Under the probability measures we shall use, it is a Markov process (see 
Remark 13. '2p . Observe that, because the largest events have radius RbL°^, the lineages have 
to be within a distance less than 2RbL°' of each other to be hit by the same event. As a 
consequence, the law of X^^ outside B{0,2RbL'^) is equal to that of the difference of 
two i.i.d. Levy processes, each of which follows the evolution given in ([1]), and thus is also 
equal to the law of the motion of a single lineage run at twice the speed. We define the 
processes X^^ and by 

Xkait) = ^X^aiPLt) and y^(t) = J-y^(p^t), t>0, (6) 

both evolving on T(L^~°) . Using computations from the proof of Proposition 6.2 in |BEV10j 
and the jump intensities given in ([1]), we find that the covariance matrix of ^^(1) — Y^{0) 
is the identity matrix multiplied by 



/ {xifLR^{x,0)dx+^ [ {xi)^LR^{x,0)dx]+o{l)=:2al + o{l) 

Jr^ T^Rb J 



(7) 
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with tending to a finite limit o"^ > as L — t- oo (by our assumption on L^^'^p^). The 
0(1) remainder here is the error we make by considering as evolving on instead of 
T(L) (see the proof of Proposition 6.2 in |BEV10j ). Assumption (ii) is also satisfied, and 
so Lemma A and its corollaries apply to (5^^)l>i, with the torus sidelength L replaced by 
L^''^. Furthermore, X^^ and follow the same evolution outside B{0,2Rb) for every L. 
This will be sufficient to prove Proposition ll.2l we shall show that the time the ancestral 
lineages of A and a need to coalesce once they are within distance 2RbL°' of one another 
(or equivalently, once Xj{^ has entered B{0,2Rb)) is negligible compared to the time they 
need to be gathered at distance 2RbL°'. It is therefore the 'gathering time' that dictates 
the coalescence time of two lineages starting at separation \xl\ ^ L". 

Remark 3.2. It is here that we take advantage of the form of our recombination mech- 
anism (recall Remark When X^^(t) 0, its future evolution is determined by the 
homogeneous Poisson point processes of events and Hg, and depends only on the cur- 
rent separation of the two lineages. If X^^{t) = 0, the situation depends upon whether 
the two lineages are in the same individual (that is they have coalesced and will require a 
recombination event to separate again), or in two distinct individuals at the same spatial 
location. However, because of the form of our recombination mechanism, two lineages can 
jump onto the same location only if they are descendants of the same parent (in which case 
they necessarily coalesce). This means that provided we choose our initial condition in such 
a way that two lineages in the same spatial location are actually in the same individual, with 
probability one we will never see two lineages in distinct individuals but the same spatial 
location and so X^^ is indeed a Markov process under P^^ . 

Notation 3.3. ^45 at the beginning of the section, we assume that all Y^ 's are defined on 
the same probability space, and start at x under the probability measure Px- Since X^^ is 
a function of the genealogical process of A, a, B and b, we retain the notation Pa^ when 
referring to it, and X^^ then starts a.s. at L~"xi if xl € '^(L) is the initial separation 
between lineages A and a. 

The proof of Proposition 11.21 will require two subsidiary results. For each L S N, 
let T^^ be the first time the two lineages A and a are at separation less than 2RbL°'. 
Equivalently, p~[^T^^ is the entrance time of X^^ into B{0,2Rb)- By the observation made 
in the paragraph preceding Remark 13. 2|, pJ^^T^^ under P^^ has the same distribution as 
T(2Rb,Y^) under P^^-c^^, which yields the following lemma. 

Lemma 3.4. Under the assumptions of Proposition rO|, we have 

V i e [/?, 1], and (8) 

B — a -f 

e"* V t > 0. 9) 

1 — a 

Furthermore, for any Pq £ (a, 1) and e > 0, the convergence in the first (resp., second) 
expression is uniform in f3,t G [f3o,l] (resp., f3 G [/3o) 1] t > e) and such that 
\xl\ G [L''/(logL),L''logL]. 

Proof of Lemma \3.4\ When /3 = 1, the results are a weaker version of Proposition 6.2 in 
[BEVlOj , in which the convergence in Q is uniform over t > and over the set of sequences 
{xl)l>i such that \xi,\ > L(logL)~^ for every L. Here, we relax the condition on {xl)l>i, 
but since the arguments in the proof of convergence (without requiring uniformity) only use 
the asymptotic behaviour of log \xl\, they are still valid. 



lim Pa, [tL > PlL'^'-''^] = f 



a 



a 



lim 

L—>oo 



aL 



a 



27rcr^ 



16 



If /? < 1, the reasoning is the same as in the proofs of Lemma 3.6 in |ZCD05j (note 
that as above we allow more general sequences of initial separations at the expense of 
the uniformity of the convergence) and Theorem 2 in |CD02j . This does not come as 
a surprise, since the same local central limit theorem applies to both (on T(L^~")) 
and Zahle, Cox & Durrett's Y (on T(L) n Z^) up to some constants depending on the 
geometry of the geographical patches considered. Hence, since X^^ starts from L~°'xl and 
log{L~°'\xi\)/(\ogL) —7- /? — a by assumption, we can write (as in Lemma 3.6 of |ZCD05j ): 



lim sup 

L-5>oo i3<t<KL 



[Tka > PlL 



2it-a)] 



(3 — a 



t — a 



lim sup 

L^oo p<t<Ki 

0, 



Vl-c.,^[T{2Rb,Y^)>L'^'~''^] 



/3 — a 



t — a 



where kl = I - (log log L) / (2 log L) (so that L^C^L-a) ^ L^i^-"') /{logL)). Now, as in 
Lemma 3.8 of |ZCD05j . there exists Lq G N and a constant C such that, for every L > Lq 
and X £ T{L), 



Y^{s) = for some s G 



L2{1-q) 

logL 



^2(1-q) 



< 



C log log L 
logL 



(10) 



Combining these two results, we obtain ([S]). 

Finally, ([9]) is the analogue of Theorem 2 in |CD02j and can either be proved using the 
same technique or in the same way as Proposition 6.2 in [BEVlOj (which, in addition, gives 
the appropriate constant in the time-rescaling). The uniform convergence stated in the 
second part of Lemma 13.41 follows from a direct application of the techniques of |ZCD05] 
and [BEVlOj cited above. □ 

The next result we need is the time that two lineages starting at separation at most 
2RbL°' take to coalesce. Under our assumption that {plL~'^°^)l>i is bounded. Proposi- 
tion 6.4(a) in [BEVIO] applied with V'l := L'^ shows that for any sequence ((/>l)l>i tending 
to infinity, we have 

lim sup [t^, > <PlPl] = 0, (11) 

L— s>oo „/ 

where the supremum is taken over all configurations a'^ such that the distance between 
the blocks containing A and a is at most 2RbL^- Observe that in |BEV10j . only one 
individual reproduces during an event, and so if several lineages are affected by this event, 
they necessarily coalesce. Here, the distributions and As of the number of potential 
parents are more general, but we assumed that their supports were compact. Thus, the 
probability that several individuals in the area of an event come from the same parent does 
not vanish as L tends to infinity, which is all that we need to prove (jlip . 



Remark 3.5. Since Ml\) shows that coming to within 2RbL" is almost equivalent to coa- 
lescing for two lineages, this is the only point where the distributions As and Xb appear in 
our discussion. 



Proof of Proposition li.^i 

Equipped with these results and the corollaries of Lemma A, we can now write for any 
given t G 1] 



[rL > =Pa, [tL > PlL'^'-''^ ; Tl > p,{L'('-'^) - logL)] 

+P., [rl > pz^L^Ct-.) . ^L^ < - logL)]. (12) 



17 



The second term on the right-hand side of (jl2p tends to zero by the strong Markov property 
appHed at time T^^ and (fTTj) with (pi = logL. Then, we have, for each L, 



[rL > PlL'^'-'^^ ; Tl > -logL)] - P. JT^ > p^L^^*"")] 

■ log L < T{2Rb,Y^) < , 



which tends to zero by Lemma |3 . 1 1 apphed with L replaced by L^~°' (the size of the torus 
on which evolves) if t < 1, and by (jlOp if t = 1. Lemma 13.41 enables us to deduce a). 

For 6), the same technique applies but with the last argument replaced by the use of 
Lemma B. □ 

Proposition 11.21 is, in fact, a particular case of a more general result which we shall 
31 (with fc = 4). Suppose we follow the ancestry at one locus of A; > 2 different 



use m 

individuals. By analogy with above, we label individuals 1, . . . , k, we write xfj for the initial 
separation of lineages i and j, T^- for the time at which their ancestral lineags first come 
within 2RbL°' and T^j for their coalescence time. We also write (resp., r^) 



for the 

minimum over {i / j} of the X^j's (resp., the Tj^'s). Although (in the same way as above) 
we could state a result for a more general sequence {aL)L>i of inititial configurations, for the 
proof of Theorem 1 1.41 we shall need some uniformity in the convergence. For this reason, we 
consider T{L, k, rj), the set of all configurations of k lineages on T(L) such that all pairwise 
distances | xfj \ belong to [L^ / (log L),L^ log L] . 

Proposition 3.6. For any /3 £ (a, 1], e > and i ^ j , we have 



lim sup sup 

L^oo I3<jj<t<l aL&{L,k,'n) 



- a \ (2) 
t — a 



lim sup 



L— >oo 



sup 



t>e,l3<ri<l aLer{L,k,r]) 



L 1 — a 



PlL^^^-"'^ log Lt 



rj-a _t\\2, 
1 — a 



0. 



The same is true with replaced by . 

In essence. Proposition 13.61 tells us that on the timescale t G [rj, 1], the time of 

the first coalescence (or of the first 'gathering') is approximately the same as that of the first 
merger in a Kingman coalescent timechanged by log (^5^), and that the approximation is 
uniform over 77's bounded away from a. Moreover, asymptotically, just as in the Kingman 
coalescent, each pair of lineages has the same chance to be the first to coalesce. On the 
other hand, on the timescale ^^^p^L^^^"") log L i, conditional on > piL'^^^~°^\ the 
asymptotic behaviour corresponds to Kingman's coalescent run at speed 1. 



Sketch of proof. The proof of Proposition 13.61 is a straightforward adaptation of those of 
Lemma 4.2 and of Lemma 5.2 in |ZCD05| (see also the comments given in the paragraph 
following the proof of Lemma 4.2). The interested reader will also find there references to 
earlier results for the random walks with instantaneous coalescence which are dual to the 
two-dimensional voter model. □ 



Let us end this section by recalling a lemma of |BEV10j and by stating an analogous 
result. For every L G N, i 7^ j and t > 0, let Xf<{t) be the separation (on T(L) at time t) 
of lineages i and j. 
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Lemma C [6.9 in |BEV10| ] Suppose k = 4 and 



miiij^j log \xfj\ 



lim 

L^-oo log L 



Then, 



lim 

I/— >oo 



maxi-^jlog\xfj\ 
logL 



(13) 



lim P, 

lim P, 

L— >oo 



= r- 



12 > < 



logL 

L 
logL 



^0, 
0. 



These results are also true if is replaced by T^ . 

In words, when two lineages meet and coalesce, with probability tending to one the others 
are at distance at least L/ log L of each other and of the coalescing pair (in particular, such 
a merger involves at most two lineages at a time). When the initial distance between the 
lineages is of the order of with /3 < 1, we have instead: 

Lemma 3.7. Suppose again k = 4 and the limit in il3\} is equal to f3 G (a, 1)- Then, 



lim P, 

L— >oo 



lim P, 

L— >oo 



logL 
logL 



|#4(r^)| i 



logL 
L° 



logL 



'^^ L'^logL 



L° log L 



PL 
L 



PL 
L 



0, 



0. 



The result is also true if is replaced by T^ . 

Notice the rescalings of time by pi and space by L" introduced in ([6]) under which the 
behaviour of the lineages is close to that of finite variance random walks. In fact, although 
their formulations are rather different, Lemma 13.71 is very similar to Lemma 1 in |CG86] or 
Lemma 5.1 in jZCDOSj for coalescing random walks. 



Sketch of proof of Lemma 3.1 . The method of proof is identical to that of Lemma 6.9 in 
[BEVIO] . to which we refer for more complete arguments. It is based on two facts. First, 
by time /?j^L^(^~") / (log L) the separation of the lineages is never on the order of the side of 
the torus. Second, if TI' = Tj^, then L~'^Xi^{pL-) and L~'^X^^{pL-) considered separately, 
follow the same law as the difference of two independent lineages (on M?, by the first fact) 
conditioned on not entering -6(0, 2Rb) before TJ"/pl. By Lemma [3. 4|, with high probability 
TJ'/pl ^ L'^^^~"\ and so the result for T^ follows from a standard central limit theorem. 

The modifications needed for use the very rapid coalescence of two lineages gathered 
at distance 2RbL'^ to obtain that, with probability tending to 1, if = then no other 
pairs of lineages come within 2RbL°^ of one another before time . An application of 
Lemma 13.71 (with T^) completes the proof. □ 



4 Genealogies at two loci 

Prom now on, we work with the rescaling of time and space introduced in ([6]). As we saw in 
the previous section, these are the appropriate scales on which to understand the behaviour 
of a collection of independent processes following the dynamics driven by ([T|). Because our 
lineages move independently as long as they are at distance greater than 2Rb (in rescaled 
units) of each other, it is also the relevant regime in which to understand 'gathering' and 
coalescence of ancestral lineages. 



19 



The aim of ^4. II and ^4.2l is to understand how two Hneages, initially present in the same 
individual, can 'decorrelate' and how much time they need to do so. Once this phenomenon 
is understood for two lineages, we can consider the more complex situation described in the 
^1.41 and prove Theorems 11.41 and 11.51 This is achieved in ^4.3[ 



4.1 Effective recombination time 

For every L, let be the process that records the (rescaled) difference between the 

locations of the lineages labelled A and B. Recall that under our working assumptions, 
these lineages start within the same individual (in other words, A and B belong to the 
same block of the marked partition a^). 

By construction, recombination occurs only during small events. In our rescaled space 
and time units, a recombination event results in a separation of the lineages of 0{L~'°'), and 
then small events affect them at rate 0{pl). Hence, it is very likely that (in our rescaled 
time units) the lineages very rapidly coalesce and have to wait for the next recombination 
event (that is, roughly (pl^^l)"^ units of rescaled time) to be geographically separated 
again, and so on. An efficient way for the lineages to escape this 'flickering' due to small 
events is for a large event to send them to a separation of 0(1). This necessarily occurs 
at a time when ^ 0. Thus, let us define as the first time t at which at least one 
of the two lineages is affected by a large event and X\^{t—) ^ (which does not prohibit 
^ab(^) ~ 0)- '^^^ effective recombination time. Its large-L behaviour is given 

by the following proposition. 

Proposition 4.1. There exist 61,62 > such that for every 6 > 62 and every non-vanishing 
sequence {4>l)l>i satisfying < L'^ /{pilog L) for every L, we have for L large enough 

6log{<pLPL) 



>(t>L{l + 



The idea of the proof of Proposition 14. II is to show that, with very high probability, the 
number of visits to of X^^ before it has accumulated a time (/>l outside is less than 
(/)L log((/)L/OL). Since each visit lasts a time proportional to {rLPL)~^, the total amount of 
time it takes for X^^ to accumulate (p^ units of time outside zero is at most of the order 
of (pL + (pL \og{(l)LPL) / {tlPl)- The probability that by this time the two lineages have not 
been affected by a large event while in distinct locations is bounded by a quantity of the 
form e~^'^^ . 

Let us write TZi{x) for the rate at which at least one of the lineages is affected by a 
large event when X^^ = x, and recall that time is rescaled by a factor p^. From the 
expression for the intensity of Ilg, we can find a constant Cb > such that TIl{x) > Cb 
for all X G T(L^~") \ {0} (in fact, one can even show that the function x 1— )• TZl{x) is 
increasing in and so one can take Cb '■= > 0). Let X^ be a T(L^~°)-valued 

Markov process distributed in the same way as the difference between two lineages subject 
only to the events of 11^, and be an exponential random variable with instantaneous 
rate 7^L(X^(t))l|^£^j^_^p|. By the preceding remark, is stochastically bounded by an 
exponential random variable with instantaneous rate CB'i-^xL(^^-^-/,Qy Because large events 
have no effect when X^^ = 0, the law of the stopped process {X^^(t), t G [0, S'^]} is the 
same as that of {X^{t), t G [0,5"^]}. Thus for the proof of Proposition O we work with 
X^ and and use P^, to denote the law of X^ under which P[X^(0) = x] = 1. 

For each L G N, let us define the stopping times (Qf)i>o and (gf )i>o ^Y'-Qq = ^0 = 
and for every i > 1, 

Q^:=mi{t>qt_,: X^(t) / O} 
4^:=inf{t>Qf : X^(t)=0}. 



20 



(Note that Qf = if X^{0) / 0, in which case Qi is the first hitting time of 0.) By 
construction, the random variables (Qf — QiLi)ieN are i.i.d. and distributed according to an 
exponential random variable with parameter CrccfiPL, where C^ec '■= '^Rs'^si^ — ^s{{^})) > 
(the last factor arises since the number of reproducing individuals needs to be greater 
than one for recombination to occur). We have the following result for the excursions of 
away from 0. 

Lemma 4.2. There exist Cg > and Ug > such that for every L > 1 and Uf> < u < 
LV(logL), for every x G B{0,2RsL^°') \ {0}, 



log u 



Proof of Lemma \4-^ Here (and only here) it is easier to work with the initial time and 
space units and show that the probability of an excursion outside of length greater 
than u is bounded from below by Ce/(log u) when u is large. Let us thus define by 
X^{t) := L°'X^{p2^t) for all t > 0, with the understanding that X^ starts at L"x under 
the probability measure P^;. 

The desired result is shown in |RR66j for standard discrete space random walks whose 
jumps have finite variance as well as for Brownian motion (with the hitting time of replaced 
by the entrance time into a ball of fixed radius) in two dimensions. To see why it is true for 
X^ on T(L), observe first that by time L^/(logL), the process X^ does not see that space 
is limited, and so it behaves as though it were moving in M^. More precisely, there exists a 
constant C > such that for all z £ B{0, dRsL""^), 



sup \X^{u)\ > ^ 

ji<L2/(logL) o 



< ^ 



logL 



(Use the L^-maximal inequality and the fact that \X^\ is bounded by the corresponding 
quantity for the same process defined on M^, which is proportional to L^/(log L) by Equation 
(22) in [BEVlOj ). Hence, let us assume that X^ is defined on instead of T(L). bmce 
the evolution due to small events depends on L only through the torus sidelength, with our 
new convention all X^'s have the same distribution and we can drop the exponent L in 
the notation. For the same reason, we also write qi for the random times piQi, that is the 
length of the first excursion outside of X. 

Let T(4^^) denote the first time X leaves i?(0, 4i?s) (and so XiT^^^j^^^) S B{0,6Rs) \ 
B{0,4:Rs) by our assumption on the jump sizes), and let T[2Rs] be the first return time of 
X into B{0, 2Rs) after T(4r^). We have for every x G B{0, IRsL^"") \ {0}, 

Px [qi > u]>P^[qi > u ; T(4r^) < qi] 

> Px[qi - T(4/j^) > n; T(4r^) < qi] 

The first infimum is strictly positive. To see this, note that Px,-c.y[T(4^^) < ^i] is bounded 
from below by the probability that the first four small events aff'ecting the lineages send 
them to a distance at least ARg of each other before they coalesce, and the infimum over 
B{0, 2Rs) \ {0} of the latter probability is positive since Ug < 1 (if = 1, only one of the 



>E 

> 
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lineages can be in the geographical range of such separating events, and so their probability 
of occurrence shrinks to as |y| — t- 0). 

For the second infimum in (jl4p . we use the same construction as in the proof of Sko- 
rokhod embedding (see e.g. |Bil95j ) to write the path of X as that of a standard Brownian 
motion W considered at particular times. More precisely, if (o'i)igN is the sequence of jump 
times of X, we can find a sequence of Brownian stopping times ((Tj)igN such that (VF((Tj))j>o 
has the same joint distributions as {X{ai))i>Q. For every i S N, conditional on W{ai-i), 
CTj is the first time greater than cjj_i at which W leaves B{W{ai-i),li), where the random 
variable li is independent of W and of < i} and has the same distribution as the 

length of the first jump of X. As a consequence, if h{u) := max{i : d'i < u} , hy comparing 
the paths of X and of W we obtain 

Now, each di — di-i is stochastically bounded from below by an exponential random variable 
with positive parameter k\ > 0, and so by standard large deviation results we can find A;2 > 
large enough and /cs > such that for all u > 1 and y G R^, 

Vy[fl{u) > k2u] < 6"'=^". 

By construction, each cTj — (Tj_i is stochastically bounded from above by the first time 
Brownian motion started at leaves B{0,2Rs), which also has an exponential moment. 
Hence, there exist /c4, fes > such that for all u > 1 and y gM?, 

Using these bounds and the result already established in |RR66j for Brownian motion at 
time k4U, Lemma 14.21 is proved. □ 

We now have all the ingredients we require to prove Proposition 14.11 
Proof of Proposition \4-l\ 
Set 

9log{(j)LPL)^ 



riPL 



(15) 



and call t{ipL) the time X^ spends away from before time ipL- We have, for every L, 

Po[^'' > M =Po[5^ > V'l; t(VL) < +Po[5^ > V'l; t(VL) > <Al] 
<Po[t(VL) <0l] +e-^^'^^ 

where Cb is the lower bound on the rate of effective large events introduced just below the 
statement of the proposition. Next, if we set '■= sup {i : Qf < ^l}, that is kL is the 
number of excursions of X^ away from which start before time t/ji, we can write 

Po[t(VL) < M =Po[t(VL) < 0l; h < 0Llog(</>LPL)] 

+Po[t(V'L) < ^l; h > Hlog{(l)LpL)]. 



On the one hand, 

Po[t(V'L) < 0l; h > (/'Llog(</>L/5L)] <Pi 



[<f>L log{<pLPL)\ 
i=l 

(J \ [4>L log(<^LPL)J 



< 1 



log( 
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for a constant Cg > and L large enough. The second hne is obtained by an obvious 
recursion using the strong Markov property at the successive times qf in decreasing order, 
and the third Hne uses Lemma 14.21 (recall that by assumption on (pi, we have (pLPL oo 
and (pLPL < -^^/(logL)). Hence, we can set 6i := Cb A Cg. On the other hand. 



Po[t(^L) <</'l;^l < 



< Pn 



log{(/)LPL)] 

14>L iogi4>LpL)\+l 



Qi-i) > i'L 



exp < riPL 



[<I>L log{9iLPL)J+l 



> 



exp {rLpL{ipL -4>l)] 



< g-6'0Llog(<ALPL)j7; 



exp < rLPL 



V<t>L log(<^Z,PL)J+l 

1=1 



where the last line uses the Markov inequality. As we pointed out above, the random 
variables TLPiiQi ~ ^t-i) i-i-d. with law Exp(Crec)- Therefore, we can write for a 
constant ^2 > 

Po[t(VL) < 0l; h < 0Llog(0LPL)] < e~^'-''^'^^'°^^^^P^l 

Combining these results, the proof of Proposition 14. 1 1 is complete. □ 

Finally, let us use Proposition 14.11 to obtain some estimates on the time two lineages 
starting in the same individual need to reach a separation at which they start to evolve 
independently. The following lemma will be a key result for the proof of Proposition 14.41 in 
the next section. For every L G N, let T^^^fi^^ denote the exit time of from -6(0, 3Rb)- 

Lemma 4.3. There exists a constant 63 > such that if 1 



4>L — )• 00 as L 
L>Lo, 



00 and 



> 



T 



{3Rb) 



> 



92, there exists Lq 
'^og{(pLpL] 



1 + 



riPL 



< 



'^l)l>i is as in Proposition \4.ll 
^5(0l)l6n) such that for every 



Proof of Lemma 
This time we define Q, 



For conciseness, we again use the notation ■0^ introduced in (fT5 



and 



Qf 

kL 



■ inf {t > ql'_i : t is the epoch of an effective recombination |, 
:inf {t > Qf : X'^sit) = or X^^it) i i?(0,3i2B)}, 



: max 



AB 

{^: <Tf3^,)} 



First, we claim that there exists a constant p > independent of L such that, for L 
large enough, /cl + 1 is stochastically bounded by a geometric random variable with success 
probability p. In other words, the probability that X\^ starting at x S B{0,3Rb) \ {0} 
leaves B{0,3Rb) before hitting is bounded from below by p, independently of x. The 
proof of this claim is given in the first paragraph of the proof of Lemma 6.6 in [BEVlOj . 
(The quantity p is taken to be the probability that a sequence of large events sends the 
lineages to a distance of at least 3Rb without meanwhile being counteracted by small events 
bringing them too close together.) As a consequence, for any large L, 
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Next, let us write 



kL 

E 

i=l 



+ 



(3Rs) 



>i'L]kL< 



■,j:{Qf-iti)<^ 



i=l 



(16) 
(17) 



The quantity in (fT6]l is bounded by 



Lv^J 



(Qf-Ci) 



> 



< 1 



ah 



ah 



Lv^J 



- )< 



2 



< 



< 1 - 1 - sup 



5-^ > 



2V 



(18) 



where the last line is obtained by recursion (notice that, conditionally on g/l^, — qjl_i 
has the same law as the effective recombination time Sl) and the supremum is taken over 
all initial configurations in which lineages A and B are either at distance or at distance 
greater than 3Rb- We can in fact restrict our attention to the set of configurations in which 
A and B belong to the same block. Indeed, if \Xj{^{0)\ > 3Rb, we can decompose the 
probability that > 'i/'L/(2\/^) into the sum of 

• the probability that > V'l/(2\/^) and does not hit before time ip l / {^s/^) ■, 
which decreases like ^f^L since the rate at which large events affect the lineages 
when 7^ is bounded from below by a positive constant; 

• the probability that > iI)l/{'^V4>l) and X^^ hits before time 'i/'L/(4-v/^), which 
boils down to the case X^^{0) = by the strong Markov property applied at the first 
time X^^ = 0. 



Now, by Proposition 14.11 applied with (pi replaced by 

IpL 



5^ > 



< 



<e~ 



5^ > 



i/2)v^ 



+ e~ 



!)l/2, we have 

rLpL J 
{(e-e2)/2)v^log(v^pi/2) 



1 + 



~ te * as t 
(ei/4)v^ 



00, we 



Substituting in (fTSj) and using the asymptotic relation 1 — (1 — e~*)* 
obtain that for L large enough, the quantity in (fT6]l is bounded by \f^e~ 

As concerns (jl7p . observe that there exists ^4 > such that for every L > 1, each of 
the — Q\ is stochastically bounded by an exponential random variable with parameter 
^4. Indeed, when X\^ lies within i?(0, (3/2)i?B), the rate at which a coalescence occurs 
due to a large event is bounded from below by a positive constant. On the other hand, it 
is not difficult to check that when X\^ lies within -B(0, (3/2)i?B)'^; the rate at which the 
two lineages are sent at a distance greater than 3i?B by a large event is also bounded from 
below by a positive constant. The quantity in (fT7|) is therefore bounded by 



Lv^J 



Q\) > f 



< 



E^.2f 



< exp 
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where (i?j)jgN is a sequence of i.i.d. exponential random variables with parameter 64^ and 
c is a positive constant expressed in terms of the exponential moment of £1. The result 
follows. □ 



4.2 Decorrelation time of two lineages starting in the same individual 

In the previous section, we obtained some information on the time required for two lineages 
starting in the same individual to become separated by a distance greater than 3Rb- We 
know that the lineages behave independently whenever they are at distance greater than 
2Rb- However, nothing guarantees that after the random time T^^^^ of Lemma 14.31 the 
ancestral lineages of A and B will evolve independently. Indeed, it is very likely that after 
some time they will once again be within distance 2Rb of one another and coalescence 
events will keep them close together for a potentially long period of time. Hence, in order 
to prove Theorem II. 4^ we would like to know how much time our lineages need before they 
start 'looking' as if they were independent. That is, we are interested in the time until their 
separation is of the same order as if they had evolved according to independent copies of £^ 
started from 0. Recall from Lemma A that for (large) times less than L^^^""^) /yTogX, the 
difference of two independent lineages behaves like Brownian motion on M?. The following 
proposition thus tells us that the decorrelation time we are looking for is asymptotically 
bounded from above by (logL)^(l + jj^)- 

Proposition 4.4. Let {Tl)l>i be a sequence of times such that (logL)^(l + < < 

^^^^ for every L. Then, 

= 0. 



lim 

L— >oo 



\x'ab{Tl)\ i 



logL' 



Tl log L 



The scheme of the proof of Proposition 14.41 will again be to decompose the path of 
into appropriate excursions and incursions. We shall show that the proportion of the time 
before Tl that spends in the region of space where it does not evolve like the difference 
of two independent lineages is asymptotically negligible. 

To this end, for every L S N, let us define the stopping times (Qf )i>o and {qf)i>o by 
9o ~ Qo ~ 0) aiid for every i>l, 

Qf:=mf{t>qti : ^^^(t) ^ 5(0, Si?^)} , 
gf:=inf{i>Qf : Xji^it) € B{0,2Rb)} , 

with the convention that inf = +00. We also write kL for the number of 'excursions' that 
start before time Tl, that is 

kL ■= max {i : < Tl}. 
The first step in proving Proposition 14.41 is to show that 

Lemma 4.5. For every 5 € (0, 1/2), there exist K{5) > such that for all L large enough, 

¥a^[kL> K (6) logn] <6. 

We postpone the proof of Lemma 14.51 until the end of the section and instead exploit it 
to prove Proposition 14.41 



Proof of Proposition 4-4 
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We construct a coupling between and a compound Poisson process which 

evolves as the difference between two independent copies of i^. Define as follows: during 
an excursion of X^^, Y^ makes the same jumps as at the same times, that is 



Vi> 1, VtG (Qf,^^] 



Y^{t)-Y^{t-) 



During the remaining time, Y^ jumps independently of Xj^^ with a jump intensity equal 
to twice that given in ([T]) rescaled in an appropriate manner. It is easy to check that the 
law of Y^ is indeed as claimed, since outside B{0,2Rb), X^^ evolves like the difference 
of two independent lineages and so the jump intensity corresponding to the process Y^ is 
equal to twice that in the rescaled version of ([1]) at any time. Furthermore, by construction, 
the difference between X^^ and Y^ changes only during the time intervals [^f^iiQf]- For 
convenience, we retain the notation ¥ for the probability measures on the (larger) space of 
definition of the pair {X^^,Y^), and set y'^(O) = 0, Pa^-a.s. 

Let us call II the amount of time before during which X^^ and Y^ behave inde- 
pendently, that is 

kL 



lL:=Y.^Ql-qtl) + iTL 



i=l 



If 82 is as in Proposition 14.11 we have 



\XkBiTL)\ i 



logL' 



, V^ilogL 



< 



ah 



logL' 



II > (logL)^ 1 + 



202 log {pL logL) 

TLPL 



(19) 



First, let us show that the second term in the right-hand side of (|19p converges to as 
L — >■ 00. Let 5 S (0, 1/2). By Lemma 14.51 there exists K > 1 such that for L large enough, 
Pa^i^L > KlogT^] < 5. Hence, we can write 



0.L 



Il> (log Lf{l + 



2^2 log (pL log^) 
TLPL 



< 



lL>{logLf{l + 



262 log {pL logL) 
rhPL 



■ kL < KlogTL 



+ 6. 



Now, by the same reasoning as in (jl8|) we have 



h > (logL)2 1 + 



< 



[A-logTiJ+l 

E 

i=l 



202 log {pLlogL) 
riPL 



< 1 



sup 



Qf-qti>{logLf{l + 
KlogTL + 1 



; kL < KlogTL 

2^2 log (pL logL] 



> 



1 + 



TLPL 

202 log {pL logL) 
TLPL 



[KlogTL \+l 



(20) 



where the supremum is taken over all initial configurations in which the distance between 
the blocks containing A and B is at most 2Rb- Again as in (|18p . we can restrict our attention 
to initial configurations in which A and B belong to same block (recall from the proof of 
Lemma 14.31 that the rate at which a sequence of 'separating' events occurs is bounded from 
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below by a positive constant whenever ^ 0). By assumption, logT/;, < 21ogL and 

X > 1, and so using Lemma 14.31 with = (logL)/(2A') for the last inequality we obtain 
that for all large L, uniformly in as above, 



(logL)2 



i^logTi + l 



1 + 



2^2 log [pL log^) 
TLPL 



< 



qL > f 1 + 2g2log((2K)'VLlog^) \ 

< V(2K)-ilogL e-e3V(2i^)-MogL^ 

Consequently, we obtain from the asymptotic relation 1 — (1 — te~*)*^ ~ t^e~* that the 
quantity in the right-hand side of ()20p tends to zero as L — t- oo and 

202 log (pL log L 



limsup 



h > (logL)^ 1 + 



rLPL 



< 5. 



Since 5 was arbitrary, this limit is actually zero. 

Let us now show that the first term in the right-hand side of (jl9p tends to zero as 
L — >■ oo. To this end, observe that it is bounded by 



h < (logL)2 1 + 



202 log (pLlog^) 
riPL 



IxkBiW - l^^(Ti)| > (log log L) (log L) 1 + 



202 log (pLlogL) 



riPL 



+ 



II < (logL)^ 1 + 



202 log {pLlogL) 
rLPL 



logL' 



, ^/TLlogL 



I i r I [ 202 log PL log L) ) / 

\Xab[Tl)-Y [Tl)\ < (log log L) log LK 1 + ^ 

I rLPL J 



(21) 



Because the difference — changes only during the periods [qi_i, Qi], during which 
l^iel < and jumps around according to twice the jump intensity given by the 
appropriate rescaling of ([T]), the first term in (j2T]) is bounded by 



II < (logL)^ 1 + 



202 log {pL logL) 
rLPL 



|y^(/L)| +3/?B > (loglogL)(logL)<^ 1 + 



202 log (pLlogL)y/^ 



rLPL 

where Y^ is an independent copy of Y^ starting from 0. Hence, we also have as an upper 
bound 

PaJ|^'''(/L)| > (loglogL)y7z:-3i2B], 

which tends to zero by a standard use of Markov's inequality and Equation (22) of |BEV10| . 
As concerns the second term in (j2ip . it is bounded by 



logL 



+ (loglogL)(logL)<^ 1 + 



202 log (pL logL) 



rLPL 

TilogL- (loglogL)(logL) 



1/2 



^ ^ 202log(pLlogL) |V^ 



rLPL 



a-L 



(l + 6«),VTLlogL(l-e^^0 



logL 



{2)^ 
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where by assumption on Tl and the fact that pi > logL, 

\2 



^(1) _^ (logL)MoglogL f ^ 202 log [PL log L) 



1/2 



< c 



log log L 



and 



-(2) 



log log L J 202 log (pL log L) 



1 + 



1/2 



< c 



y/]ogL 

, log log L 

(log L)5/2 • 



An application of the central limit theorem then gives the result. 
The proof of Lemma 14.51 rests upon the following lemma. 



□ 



Lemma 4.6. There exists Cq,Vq > such that for every L large enough, Vq < v < 
L2(i-")/(logL) and every initial condition a'^ in which the separation between A and B 
belongs to B{0, 5Rb) \ B{0, 3Rb), 



Pa' [qi >v]> 



log V 



The proof of Lemma 14.61 uses the same arguments as the second half of the proof of 
Lemma 14.21 (based on Skorokhod embedding) and so we omit it. 

Proof of Lemma \4-5\ Our strategy is to show that if we choose K large enough, the 
probability that none of the first K log Tl excursions outside B{0,3Rb) has duration of 
0{Ti) is smaller than 5. To achieve this, let -fC > 0. We have 

{kL > Klogn] =Fa, [Q^iogr,j+i < Tl 

E iit-Qi)+ E iQ'-it-i)<TL 

i=l 1=1 



■ aL I 



0.L 



<: 



i=l 



Qi) < Tl 



< FaJV i e {1, . . . , [KlogTLl}, gf -Qf< Tl] . 

Using a recursion and Lemma [4.61 together with the fact that \X^q{Q^)\ G [3Rb,5Rb] 
(recall the jump lengths are bounded by 2Rb), we arrive at 

^ [yi€{l,...,[KlogTL\},qt-Qi<TL]<(l ' 



as L — oo. 

Now choose K{5) large enough that e~^^^^'-^i < 5/2, and Lemma 14.51 is proved. 



□ 



4.3 Proof of the main results 

Now that we understand decorrelation better, we can prove Theorems 11.41 and 11.51 Recall 
the rescalings of time by a factor pL and of space by -L~" that have been in force since 
the beginning of ^ and the notation for the coalescence time of lineages i and j in 
original units. In order to work in the rescaled setting, we define t^- := Tfj/pL for every 
i,j € {A,a,B,b}, and t^ := t^^ A t^^. We denote the genealogical process (on the original 
space and time scales) of the four loci corresponding to step L by A^. As explained in the 
^1.4| this Markov process takes its values in the set of all marked partitions of {A,a,B,b}. 
For any t > 0, each block of A^{t) contains the labels of the lineages present in the same 
individual at (genealogical) time t, and its mark gives the current location on T(L) of this 
common ancestor. 
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Remark 4.7. Several times during the course of the proofs below we shall apply Proposi- 
tion \4.4\ with Tl = Strictly speaking we can only do this if L'^^^~"^ > (logL)^(l + 

^■tlpl ) ' l^CLst for L large enough, which is not guaranteed by However, if it is not the 
case, we can still find a sequence ((^i)igN tending to infinity and such that 

> (log Lf (l + , V L E N, and lim = 0. 

^ fLPL ' L^oo logL 

Now, for the sake of clarity we presented the results of Lemma \3.4\ at times of the form 
PlL?'^^~'^^ but its proof shows that, because log{(f)LL'^^^~°'^) ~ log(L^(^~"^) as L ^ oo, we 
also have 




(Another way to see this is to use the inequality Fa^[T^^ > pL(f>LL'^^^ < P^^ [T^^ > 
Pl(/-lL2(/5-")] for any fixed t > P and L large enough, and then let t tend to (3.) Hence, all 
the above arguments carry over with L^(^~") replaced by (f>LL'^^^~"\ Since the modifications 
are minor, we work with L^^'^""^ in all cases. 



Proof of Theorem \1.4\ The main difficulty is that we are interested in the first coalescence 
times of the pairs {A, a) and {B,b), regardless of that of any other pair. As a consequence, 
several coalescence and subsequent recombination events may occur before t^ , creating some 
correlation between lineages originally far from each other {A and b for instance) . The point 
is to show that on the timescale of interest, decorrelation occurs fast enough for the system 
of ancestral lineages to behave like two independent genealogical processes, one for each 
locus. 

Let us start by showing a). Note that we can assume /3 < 1, since otherwise the result 
follows from Proposition 11.21 and the bound 



q)1 







as L 



oo. 



Hence, suppose /3 < 1, fix t G (/3, 1] (the case t = /3 is treated as above) and let L G N. 
By the Markov property applied to at time plL'^^^~'^\ we have 



[t^>L2(*-")] =E, 



l{ti>L2C3— )}^^i(pi,L2{/3-"))[t > -'^ ' - L '\ 

)|t^<Z.2(,-.)|P^.(,,i2(,-.)) [t^ > L2(*--) - L2(/3-a)]] . (22) 
Again, the second term in (j22p is bounded by 

IPaJtL<L2(/'-°)]+P,Jt^,<L2(/^-)], 

which tends to as L — ?• oo by Proposition 11.21 Since Lemma 13.71 shows that, with proba- 
bility tending to 1, at most two lineages at a time can meet at distance less than 2Rb, we 
can define T^ as the first time two of the four lineages come within distance 2Rb of each 
other and write 



E, 



I l^L . r2(i-a) r2{/3-a) 



^2{/3-q)1 



P^.(,,^2(,-.)) [Tf < - L2(/5-) ; t^ > - L2(/5-")] . (23) 
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Setting aside the first term in the right-hand side of ()23p for a moment, we further decompose 
the event corresponding to the second term: 



(24) 



where denotes the pair of labels of the lineages which 'meet' at time T^. Let us 
show that the second term in (|24p tends to as L — )• oo. Using Lemma 13. 4| we know 
that, with probability tending to one, no pairs of lineages starting at (rescaled) separation 
L~"xl have met at distance less than 2Rb by time L'^(l^~"\ Hence, until this time any of 
these pairs taken separately evolves like two independent compound Poisson processes, and 
their mutual distance at time L^^'^""^ lies within [L'^~"/(log L), L^~°' log L] with probability 
tending to one (by a standard application of the Central Limit Theorem). On the other 
hand, by Condition ([2]) we can use Proposition 14.41 with Tl = (see Remark 14. 7p 

and conclude that with probability tending to 1, the distance at time Tl between each pair 
of lineages starting within the same individual also lies in [L^~°/(log L), L'^"" log L]. The 
situation has thus become rather symmetric by time L?'^^~°'\ Suppose for instance that 
m{ = Aa. Then, either Tf < - L'^iP-^) - logL and t^ > - L'^iP-<^) or 

Tf G [L2(*-°) - - logL, - The probability of the first event tends 

to by (jlip . which shows that once A and a are gathered at distance smaller than 2Rb, 
they coalesce in a time smaller than logL. Lemma l3.ll (if t < 1) or (|10p (if t = 1) shows 
that the probability of the second event also tends to as L — t- oo. Hence, the second term 
in (plj) does indeed vanish as L — oo. 
So far, we have obtained 



\, [t^>L2(*-")]=E,^ 



P^.(,^i2(,-.))[T[ < - l2(/5-) ; m[ i {Aa,Bh} ; 

>l2(*-)-l2(/'-°)]1 (25) 



where 5\ 



as L — )• oo. Next, by the strong Markov property applied to at time 



PlT^ and the fact that < t a.s., we have 



.4i(pLL2{/3-")) 



[T[ < l2(*-") - l2(/5-") ; mf i {Aa,Bb] ; t^ > l2(*-") - L^^^"")] 



E, 



E 



.4i{p£L2(/3-a)) [l{Tf'<L2(t-")-L2(/3-t.). m^^{Aa,Bh}} 

If t < 1 , Lemma 13.71 tells us that with probability tending to 1 , the mutual distance be- 
tween each of the 5 pairs of lineages different from mf" at time Tf belongs to the inter- 
val [(Tf)^/V(logL), (T[)V2iogL]. If t = 1, Equation ^ shows that we can replace 



by 1 



{Tf <L2(i-")/(logL)} 



, up to an asymptotically vanishing error term. 



and so Lemma 13.71 still applies. Hence, by the uniform convergence stated in Lemma [37 
the probability that one of these pairs meet at distance less than 2Rb before 2T{' tends 
to zero. Furthermore, Proposition 14.41 guarantees that with very high probability, the pair 
that meet at time Tf' is also at a distance belonging to [(Tf')^/^/(log L), (T^')^/^ logL] after 
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another units of time. (This statement uses a conditioning on T^, which turns 2Tf into 
a deterministic time and enables us to use Proposition 14.41 ) Defining and in the 
same manner as above (we number the different quantities which appear here to make the 
recursion clearer) and using exactly the same arguments as those leading to (|25p . we can 
thus write that with probability tending to 1, 



E 



+ E 



f ) [T^ < - L^^^-") - 2Tf ; ^ {Aa, Bb] ; 



+ 6' 



with (5£ — 7- as L — )• cx). It is easy to check that the above equality is also valid if 
]^2{t-a) _ i2{f}-a) _ < Q. By induction, we obtain for any A; G N 

Fa, [t^ > L2(*-")] 



+ E, 



2(/3-«)l 



E 



l{Tf <L2(*-«)~L2C'5-");m.f' ^{Aa.Bfe}} 



+ 

+ E, 



XE 



E, 



'^^(piLaCS— )) 
■^i{2pLTf') 

^^.A-E'(2p£T^) ■■■ ^^■f'(2pi,T^_2) l{T|r_i<L2{i-")-L2(/3-")-2Tf'-...-2T^_2} 



l{T^<L2{t-«)-L2(/3-«)-2Tf'}l{m^^{Aa,B6}} 



{"ifc_i^{^a,m}}- 



fc-i I 



+ E, 



t^ > - L^^^-") - 2Tf' 2T 



2T 



fe-ij 



k 



(26) 



in which all occurrences of L^(^~") are replaced by L^(^~") / (log L) if we are considering the 
case t = 1. In order to stop the recursion, let us show that for any e > 0, there exists /c G N 
such that the last but one term in ()26p is bounded by e for all L large enough. To this end, 
define the sequence of random times (7/^ )i>i by 

7^ := inf {t > L?'^^~°^'^ : 2 rescaled lineages meet at distance less than 2Rb^-, 

and for any i >2, 

7^ := inf {t > 27^^^ : 2 rescaled lineages meet at distance less than 2Rb}- 

A simple recursion shows that for all i G N, 7^^ and 27^^ are stopping times. We can thus 
apply the strong Markov property at time PLli , then PLI2 ^ and so on, and obtain that 



E, 



E 



.4i'(piL2(/3-a)) 

• • X 



l{Tf'<L2(t-Q)_L2(;3-a)}E_4i,(2piTf') 



>(2p.Ttj[T^<i^(*-")-^^('^-")-2Tf 



{T^<L2{*-")-L2(/3-")-2Tf' } 
L 



2Ttil 



(27) 
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Since with probabihty tending to 1 at each time 27/" the four hneages are at distance of 
the order of (7/")^^^ of each other, Proposition 13.61 guarantees that, up to an asymptotically 
vanishing error term, the conditional probability that 7/^^ is less than L^^*"") — ^2(/3-a) _ 
27^ — ... — 27^^ is bounded from above by C := (1 + c)(l — (fr^)^), where c > can be 
chosen arbitrarily close to 0. It remains to choose A; E N such that < e and to notice that 
the left-hand side of (f27|) is an upper bound for the last but one term in (j26|) to conclude. 

Finally, let us show that the other terms in ()26p are close to those corresponding to 
a system of four independent lineages. Using the integer k = k{e) obtained in the last 
paragraph, we rewrite the decomposition (|26p in terms of (7/')j6N as follows (we retain the 
notation mf for the labels of the two lineages meeting at time 7/" and we set 7,^ := 0): 

k 

Fa, [t^ > = r?z,(e) + ^ Pa, b/li < ^ {Aa, Bb} y I € {1, . . . ,j - 1}; 

i=i 

7/ > (28) 

where T/L(e) is the sum of the last but one term in (j26p and of the error terms 6\^, and is 
smaller than 2e for L large enough by definition of k{e). Now, let us denote by a system 
of four independent lineages moving around on T(L) according to the law of the motion 
of a single (unrescaled) lineage, and let us define (7f')i>i in the same way as (7f')i,>i but 
with replaced by A^. Let us also write t^^ (resp., t^^) for the smallest time t such that 
the lineages A and a (resp., B and 6) meet at distance less than 2RbL" at time pit, and 
rhf for the indices of the pair meeting at time Exactly the same chain of arguments as 
above leads to a decomposition of Fai[t;Aa ^ ^Bb > L^(*~")] of the form (j28|) . with another 
sequence (57l(£))l>i whose terms are bounded by 2e whenever L is large enough. Now, 
let us emphasize that Proposition 13.61 also applies to the meeting times at distance less 
than 2RbL"', before which the evolutions of A^ and A^ have the same distribution. As a 
consequence, morally, we should have that the distributions of the pairs of indices and 
rhf both converge to a uniform draw from the set of distinct pairs of labels (in other words, 
each pair has asymptotically the same chance to be that meeting), and furthermore if 7/" 
and are of the same logarithmic order, so should and 7^^^ be. 
More formally, let us define, for every L E N and j > 1, 

^ log 7^ 

21ogL ■'■{7,^<^'*'"°'/(logi)} l{7i^>L2(i-")/(logL)}' 

and Cj in a similar manner. Our goal is to show that for each j, the vectors Vj" : = 
(£f , ?n^, . . . , ,m^) and := {C\,m\, . . . , ,rh^) converge in distribution as L — )• 00 
to the same random vector, whose law is obtained by successive uses of Proposition 13.61 
Thus, let us prove by recursion that the distribution functions of the two vectors converge 
to the same limit. The case j = 1 is a direct consequence of Proposition 13.61 which shows 
that for any s E [/3, 1] and ii 7^ 

lim Pa, \C{<s-a; m{ = ^1^2] 

L^oo 

lim Pa^ \C{ = 00 ; mf = ii?2l 

L— s>oo 

(Recall the analysis made at the beginning of the proof, according to which the lineages 
meet before time L^^'^"") with probability tending to zero, and at that time they are all at 
pairwise distance ©(L^"").) 



Q\ \s 
l//3-a 



a 



a 



and 
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Suppose the distribution functions of V^" and V^" converge to the same (non-degenerate) 
limit as L tends to infinity. Let then s G [/3, 1], ii 7^ 12 and B be an event of the form 
{Ci < si — a ; = 4^^4^^ ; • • • ; < Sj — a ] = 1(^2^} for some given /3 < si < 
. . . < Sj < s. Using the strong Markov property with at time "^PLlf = '^PlL^'''^ and 
recaUing the definition of T^" as the first time two rescaled fineages come at distance less 
than 2Rb of each other, we obtain 



a ; m,- 



-1 



[Tf < L2(« 



-a) 



?7T,i 



1 



(29) 



2C^ 



s — a 



Since V^^ converges in distribution to Vj^ as L — )■ 00, and since the law of Vj^ does not 
charge the boundary of B, the first term in the right-hand side of (|29p converges to 



E 



L{y-GB} X g 



s — a 



a: m 



For the second term in (j29p . we already saw that, up to an asymptotically vanishing error 
term, we can insert the indicator function of the set {A^{2plL'^^J ) E r(L, 4, Cj'+a)} within 
the expectation, where r(L, 4, r/) is defined at the end of ^as the set of all configurations 
of four lineages in which all pairwise distances between the locations of the lineages belong 
to [L''/(log L), L'' log L]. Now, we can also replace the first probability within the curly 
brackets by the probability that Tf" < L^(*~") and = 1112 by Lemma 13.11 Then, the 
uniform convergence stated in Proposition 13.61 easily gives us that the second term in the 
right-hand of (|29p tends to as L — >• 00. Likewise, as L tends to infinity, 



[V/^ e B; 



00; m,- 



^l^2 



■E 



1 

L{v^-eB} g 



'-3 

\ — a 



m 



j+l = Hi2\ , 



and an analogous result can be established when we allow some of the C^, i < j (and so the 
subsequent ones) to be infinite. Since this convergence holds for all s and iii2 as above, we 
obtain the convergence in law of V^^i towards ^^1, whose distribution is determined by the 
above limits. By the induction principle, for every j G N the sequence (V^^)l>i converges 
in distribution to a random vector V?°. Since the same arguments apply to (V^^)l>i, the 
distribution function of Vj^ also converges to that of Vj^ and convergence in distribution 
also holds. As a consequence, coming back to ([28]) . we obtain that for each term of the 
sum. 



[jf_, < mf i {Aa^Bh] V / G {1, . . . , j - 1}; 7/^ > ^'^*-"^] 



[t/Ii < mf i {Aa,B\3] V / G {1, . . . , j - 1}; 7. 

as L — 7- 00, and so 



> 



^2(i-a)j 



lim sup 

L— >oo 



[t^>L 



2{t-")l 



[t^ > L 



2(t-a)] 



< 4e. 
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Since e was arbitrary, this limit is actually zero. But is a system of four independent 
lineages, and so 



[t^ > = Pa, [tl > X P., [t^, > L 



2{t~a)l 



a \ 2 
p — a 

t — a 



by Proposition 11.21 This concludes the proof of Theorem 1 1.4|, a). 

The arguments for the case b) are very similar, using this time Lemma B for a bound 
on the probability that some lineages meet during a small interval of time, Lemma C for 
the distance separating the other lineages when two of them meet and merge and setting 
Cf:=^f/{^L^(^-'-)logL). □ 

The proof of Theorem 11.51 uses essentially the same arguments, except that now, be- 
fore time plL^^'^~^^ , we cannot use Proposition 14.41 and the lineages starting within the 
same individual are still highly correlated. In fact, because recombination acts on a linear 
timescale whereas ancestral relations evolve on an exponential timescale, the proof will show 
that a phase transition occurs:during a first phase, recombination does not act and so the 
ancestral lines of the two loci of the same individual are not yet separated, and at time 
PlL^^"'~"'^ recombination appears in the picture and is quick enough to fully decorrelate the 
genealogies at the two loci. 

Proof of Theorem I j.5l The case a) is a consequence of the result for two lineages. Indeed, 
if Condition ([3|) is fulfilled, then necessarily {log pl) / {tlPl) tends to infinity and for any 
e > there exists -Lo(£) such that for every L > Lo{e), 



log PL 



> ^2(7-a)-£_ 



riPL 

Hence, since we assumed pi < CL'^", we have for t G [/3,7), e := 7 — t and L > Lo(e): 
riPLL^^^^""^ < log PL L2{t-a-7+<^)+(7-t) < c' L"'^^'^^^ log L ^ as L ^ oo. 

Therefore, with probability tending to one, no recombinations occur by time plL'^^^~°'^ and 
boils down to a system of two lineages, one ancestral to each of the two individuals 
sampled. Proposition 11.21 enables us to conclude. 

If t = 7 and r^/OiL^^'^"") does not tend to zero (otherwise recombination is too slow and 
the same argument as above applies), then the probability that there is no coalescence by 
time r^^/(logL) tends to {(3 — a) / {j — a). Indeed, the recombination rate on the modified 
timescale is of the order of r^pi, and so with high probability no recombinations separate 
the two loci in any of our two sampled individuals before time (r^pL)'^ /(log L). Moreover, 

log {{rLPL)-^/ {log L)) log(^) -loglogpi-loglogL 

] 7 = ] 7 > 2(7 - a) as L ^ oo, 

log L log L 

hence by Proposition 11.21 (see also Remark 14. 7p , the probability that no coalescence occurs 
before r^^/(logL) tends to {(3 — a)/ — a). The last step is to observe that, again by 
Proposition 11.21 and Remark 14.71 the probability that any of the pairs of lineages Aa and 
Bb (considered separately) coalesces during the time interval [r^"^/(log L), p^L^^'''"")] tends 
to as L tends to infinity. 

For b), apply the Markov property at time i/jl ■= Pl{L'^^^~°'^ V {logLfil + ^^^)): 



2 



^I-^ P, Jrl A ri, > V^i] + 0(1), 
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where the second equahty comes from Proposition 14.41 Theorem 11.4( a) and dominated 
convergence. Now, by the case a) and Remark 14. 7|, 

IPai [tL a rj^b > V'l] ^ - — - as L ^ oo, 

7 — a 

which yields the desired result. 

Case c) is identical to 6). □ 
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